Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search
نویسندگان
چکیده
Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts which have indexed transcripts and query terms retrieved from the index. However, query terms that are not part of the recognizer’s vocabulary cannot be retrieved, thereby affecting the recall of the search. Such terms can be retrieved using phonetic search methods. Phonetic transcripts can be generated by expanding the word transcripts into phones using the baseforms in the dictionary. In addition, advanced systems can provide phonetic transcripts using sub-word based language models. However, these phonetic transcripts suffer from inaccuracy and do not provide a good alternative to word transcripts. We demonstrate how to retrieve information from speech data by presenting a novel approach for vocabulary independent retrieval combining search on transcripts that are produced according to different word and sub-word decoding methods. We present two different algorithms: the first is based on the Threshold Algorithm (TA); the second uses a Boolean retrieval model on inverted indices. The value of this combination is demonstrated on data from NIST 2006 Spoken Term Detection evaluation.
منابع مشابه
A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech
For efficient organization of speech recordings – meetings, interviews, voice mails, and lectures – being able to search for spoken keywords is essential. Today, most spoken document retrieval systems use large-vocabulary recognition. For the above scenarios, such systems suffer from the unpredictable domain, out-ofvocabulary queries, and generally high word-error rate (WER). In [1], we present...
متن کاملStrategies for high accuracy keyword detection in noisy channels
We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels. We employ a system combination approach by combining the outputs of multiple large vocabulary automatic speech recognition (LVCSR) systems, each of which employs a different system design approach targeting three different levels of inf...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملBreadth-first search for finding the optimal phonetic transcription from multiple utterances
Extending the vocabulary of a large vocabulary speech recognition system usually requires phonetic transcriptions for all words to be known. With automatic phonetic baseform determination acoustic samples of the words in question can substitute for the required expert knowledge. In this paper we follow a probabilitistic approach to this problem and present a novel breadth-first search algorithm...
متن کاملKeyword recognition and extraction by multiple-LVCSRs with 60, 000 words in speech-driven WEB retrieval task
This paper presents speech-driven Web retrieval models which accepts spoken search topics (queries) in the NTCIR-3 Web retrieval task. We experimentally evaluate the techniques of combining outputs of multiple LVCSR models with a language model(LM) with a 60,000 vocabulary size in recognition of spoken queries. As model combination techniques, we use the SVM learning. We show that the technique...
متن کامل